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Abstract 

We consider topological indices Xthat are sums of /(deg(ii))/(deg(w)), 
where {u, v} are adjacent vertices and / is a function. The Randic con- 
nectivity index or the Zagreb group index are examples for indices of this 
kind. In earlier work on topological indices that are sums of independent 
random variables, we identified the correlation between T and the edge set 
of the molecular graph as the main cause for correlated indices. We prove 
a necessary and sufficient condition for I having zero covariance with the 
edge set. 



1 Introduction 

For quite some time it has been known that topological indices (graph invariants 
on molecular graphs) exhibit considerable mutual correlation [1, 2]. This is 
a major problem when performing structure-activity studies as the employed 
statistical methods may fail or give little meaningful results on sets of correlated 
data. Also, strong correlations among a set of topological indices raise doubt 
whether these indices describe different and meaningful biological, chemical or 
physical properties of molecules. 

In an attempt to investigate the reasons for these correlations, we used ran- 
dom graphs [3] as a model for chemical graphs and for topological indices of the 
form ^ 

2^x(G) = - ^ XuXy 

{u,v}eE 

where E is the edge set of the molecular graph G = {V, E) and {X^ \ v £ V} 
is a set of independent random variables with a common expectation E{X) 
[4, 5, 6]. We proved that Xx, 2y, and Xi are linearly dependent for independent 
vertex properties X, Y with E{X) , E{Y) > as the number of vertices tends to 
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infinity. For E{X) = E{Y) = however these indices are uncorrelated. Here, 
Ii denotes a topological index with Xy = 1 for all v gV. 

While the random graph model we used in [6] encompasses graphs of ar- 
bitrary structure, including chemical graphs, the notion of vertex (or atom) 
properties Xy that are independent of the molecular graph is a serious abstrac- 
tion from computational chemistry where atom properties used for topological 
indices arc a function of the graph or even the molecule. 

In this paper, we use a slightly more general random graph model than the 
one used in [4]. In particular, we consider graphs on n vertices whose edges 
are chosen independently with a probability proportional to 1/n. The latter 
ensures that the expected number of edges increases linearly in the number of 
vertices. We use this to model an approximately linear relation of bonds to 
vertices present in molecules. For example, homologous series of aliphatic or 
aromatic hydrocarbons with n atoms contain n + c bonds for some constant c. 
Polyphenols contain |n + c bonds as each monomer adds 6 atoms and 7 bonds. 
On the other hand, there is some variation in the number of bonds for a given 
number of atoms in a heterogenous set of molecules, which is also true for the 
random graph model. 

As a more significant difference we consider the vertex properties Xy to be 
a function of the vertex degree instead of being independent. Thus, our results 
are valid for important topological indices such as the Randic connectivity index 
or Zagreb group index. We will focus on the crucial covariance between Xx and 



2 Preliminaries 

First, we describe the random graph model. For a graph {V, E) let 

[i inu,v)&E 

luv-hiuM^E}-^^ else 

be the indicator function for {{u, v} G E}. For V ~ {1, . . . ,n} let luv {u, v G V) 
be independent random variables with P{luv = 1) = P- The space of random 
graphs '^{n,p) can be identified with the distribution of {luv)u,vev- We set 
p = a/n for a fixed parameter a > so that E\E\ = Q^p ~ as motivated 
in the introduction. 

To describe the vertex properties, let / : No ^ R be a function with /(O) =0. 
We consider the topological index 

2'x=2'x(G) = ^ (2-1) 

{u,v}eE{G) 

with 

Xy = fidegiv)) 
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being the vertex properties and G G 'i^{n,a/n) is a random graph. Thus, 
/(O) = accounts for isolated vertices being ignored. Using indicators this can 
be written as 

= ^ XuXyluv (2.2) 

which is better suited to employ the expectation operator. 
We us the following notations throughout the text: 

0{f) denotes a function g with g{x) < cf{x) for all large 

X and some constant c > 

Xn — >■ X denotes that random variable X„ converges to X 

in distribution 

On y a denotes that sequence (a„) is monotonically in- 
creasing and converges to a 

3 Expectations and Covariance 

To determine expectation values, we have to eliminate the dependence among 
Xu and Xy in (2.2). This is achieved by conditioning for = 1}. If the edge 
{w, v} exists then the degree of u has no effect on the degree of v and vice versa: 

Lemma 1. 

Suppose u < V. Then the random variables (lMu')"'>ti '^'^^ {^vv')v'>v o-re inde- 
pendent with respect to the probability measure P{- \ = 1). The same claim 
holds for deg{u) and deg{v). 



u' v' 
Figure 1: We fix edge {u,v} 

Proof. Let a„„', a^„' e {0, 1} for u' > u, v' > v and auv = 1- We check that for 
(fluu')«'>«> {0'uu')u'>u holds 

-^((luu')w'>w (,^uu'^u'>u ^ (,^vv'^v'>v ((^vv'^v'>v I ^uv 1) 
~ P{luv = 1) 
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If ttuv = 0, both sides arc zero. The second claim is a consequence of deg{u) or 
deg(w) being functions of luu' or l^v', respectively. □ 

We are going to apply lemma 1 to conditional expectations. This motivates 
the definition 

4'=' =-B(Xi I Ii2li3---life+i = 1), k>0 (3.1) 

We shall see later why we also need A: > 1. For symmetry reasons, this could as 
well be defined for a vertex v ^ 1 and any set of distinct vertices {u2, . . . ,Uk} 

(k) 

different from v. As we shall see in section 4, lim„^oo Sf exists and is a function 

of a if / satisfies a condition. Thus, we may regard (5^*^^ as almost constant for 
large n. 

Lemma 2. 

E{I^) = (^5f'>y E\E\ 

Proof. 

E{I^) = ^ E{X^X, I = l)p by (2.2) 



U<V 



^ E{X^ I = 1) E{X^ I = l)p by lemma 1 

U<.V 

(5f)^E\E\ by (3.1) 



□ 



Lemma 3. 



E{I^1^) = 



E\E\ 



Proof. To dissect the sum 

= ^ ^ E{XuXyluvlu'v') 

u<v u' <v' 

according to fl consider 

Sk = {{u, V, u', v') \ u < V A u' < v' A \{u,v}n {u', v'}\ = k}, 0<k<2 
Then 

\So\ (3.2) 



l^i|=6Qj (3.3) 
1^.1 = (3.4) 
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(3.2) and (3.4) arc obvious. To verify (3.3) let (u,v,u' ,v') e ^i. Exactly two 
numbers are equal as indicated in figure 2. Cases (a), (b) allow just one way 
to distribute three distinct numbers on u, v, u', v' while there are two ways for 
cases (c), (d). For symmetry reasons, £?(X„X^1„„ = i?(XiX2li2li3) for 



u vu vu vu V 

• ••<■■•• ••■■■<•••• ♦•■■<••■• ••■■■<•••■♦ 



• ••■<■■■• •■■■<■■■« ♦■■■<■■■« »■■■<■■■* 

u' v' u' v' u' v' u' v' 

(a) (b) (c) (d) 

Figure 2: Possibilities for (u, v, u', v') G Si 

all {u,v,u',v') G 5*1. Hence, we get 

E{Ix.Il) = |5o|-E;(XiX2li2l34) 

+ \Si\E{XiX2ll2U3) 
+ |52|-E(XiX2l?2) 
= \So\E{XiX2\ll2 = l)p'' 

+ \Si\E{XiX2li3\li2 = l)p (3.5) 

+ \S2\E{XiX2 I ll2 = l)p 
= \So\E{Xi I li2 = l)E{X2 I ll2 = l)p' 

+ \Si\E{Xili3 I li2 = 1)E{X2 I ll2 = l)p 
+ \S2\E{X, I U2 = l)E{X2 I ll2 = l)p 



by lemma (1). With 

E{Xil,3 I li2 = 1) = l/p£;(Xili2ll3) = sfp 



n\ /n\ n — 2 



3/ \2J 3 



and (3.2)-(3.4), (3.5), we have 

E{I^I,)=(6f^yE\E\l^''-y 

+ 25f6fE\E\{n-2)p 
+ (5f)'E\E\ 



□ 
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Remark. With / = 1, lemma 2 and the help of Mathematica follows Var{Ii) 
E\E\{1 -p), as it should be. 

We combine the results of this section in 
Theorem 4. 

// , J are bounded in n then 







Cot; (Jx, 2^1) = 



.(4") 

Proof. By lemma 2 and lemma 3, 



ifSf^=0 
E\E\ else 



Cov{I:x.,1i) 



p + 25y' Sy\n - 2)p 



Using ("-2) 



(l)x(2)( 

3 — 2n, this can be written as 



E\E\ 



E\E\ 



Cov{Iy^,Ix) = 

f 





= < 



(4'^) (1 + (3 - 2n)p) + 2bfbf{n - 2)p 



E\E\ 



■E\E\ else 



The assertion follows with p = a/n. 



□ 



Remark. We will prove in theorem 5 in section 4 that all are in fact bounded 
in n if / e 0{x). 

Yet, it is not clear whether Cot;(Xx,Xi) 7^ for 6^^^ ^ 0. This is dealt with 
in the next section. 



4 The Poisson Distribution and Sj; 

For the proof of the following theorem recall that for random variables Xn,X 
holds ^ 

Xn — >■ X 

iff 

E{f{X^)) ^ E{f{X)) 

for all bounded and continuous functions / : R ^ R. This does not hold for 
arbitrary unbounded functions /. Therefore, we require that / G 0{x) in this 
section. While this does not seem to be the most general restriction, it facilitates 
the following elaborations. 
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Theorem 5. 

For all f e 0{x) and all k holds 



oo j 

lim 5f = E{f{k + = V /(fc + j)^e 



i=o 



w/iere *s ihe Poisson distribution with parameter a. 
Proof. By definition (3.1), 

6f = E{f {deg{l)) I I12I13 • • • lu+i = 1) 



= Elf(k+ ^ lyj (4.1) 

Since p = a/n, Poisson 's limit theorem gives 

n 

^ ly^^a (n^oo) 

The function / : No — > K can be extended to a continuous function / : R — > M 
in an arbitrary way. Hence, the continuity theorem gives 



/(fc + ^„) (n^oo) 



For all bounded and continuous functions /* follows by (4.1) 

Sp ^E{r{k+^a)) (n^oo) (4.2) 

If / is also bounded the claim follows. To prove (4.2) for unbounded / we cut / 
off above a limit to divide / into a bounded and an unbounded part. We show 
that the unbounded part tends to zero as the limit tends to infinity. 

To begin with, let |/(a:)| < x for all x and let / be unbounded. Then there 
is a sequence of integers (m;) such that without loss of generality/ (mj) y 00 
for Z — > 00 and f{mi) > for all I. Let be 




and 



Cm('^) 



5m(*^) — 



X if \x\ < m 

else 

if \x\ < m 

X else 



Let Sn^■=k + Yl'j=k+2 lij- Then 

\E{{Cm, O f){Sn)) I = \E{fiSn)UnS„)>m,}) 
< E{Snl{f{Sn)>f(m,)}) 



7 



since < f{mi) < mi 

= E{Snl{Sr,>mi}) 

since f{mi) increases monotonically 

V mj 

= ^[Var{Sn) + {E{Sn)f] 
mi 

^ — [0{n)p{l-p) + {0{n)pf] 

Tfli 

= 0{llmi) 
By linearity of expectation follows 

E{{cm,of){Sn)) = 0{l/mi) (4.3) 

for all / e 0{x). Thus, 

lim 5^^^ = lim lim -B(/(S'„)) 

n—*oo s-oo n—*oo 

by (4.1) 

= lim lim [E{{am, o /)(5„)) + £;((£„, o /)(5„))] 
= lim [E{{c^^ o f){k + + 0{l/mi)] 

by (4.2) and (4.3) 

= E{f{k + ^o.)) 

by the convergence theorem of Lebesgue. □ 

With the help of theorem 5 we are able to answer the question raised at the 

end of section 3: 

Theorem 6. 

For n — > oo and f e 0{x) holds: Xx and Ii have covariance zero if and only if 
lim„^oo = 0. 

Proof. Assume that lim„^tx) (jj.^'' ^ Oandlim„^oo Cov{Iyi,Ti) = 0. By theorem 
4 follows 

lim / = 1 _ _ 
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With theorem 5 we get 

OO ^' oo 1 — 1 

j=o j=o 
We multiply by a and substitute j with j — 1 in the first two series to get 

OO j oo j ^ oo j 

Hence, 

oo 

i=i 

By theorem 5, this series converges for all a > 0. By the identity theorem 
for power series follows that all coefficients are zero. By induction thus follows 
/ = 0, which contradicts lim„^oo <5j^'' 7^ 0. 

The opposite direction follows by theorem 4 and theorem 5. □ 

5 Discussion 

Wo have seen that S^j^^ is an important quantity for the covariancc of the topo- 
logical indices we consider. Theorem 5 shows that (5^*^^ does not depend on n 

for large n. This justifies definition (3.1) since we do not want Sj'^ to be very 
different for graphs of different size. Also, theorem 5 provides a way to approx- 
imately compute Jj'^-'. If we substitute Xy by Xy — 6^^^ in (2.1), the resulting 
index is uncorrelated to Ii . 

As a drawback, we require / € 0{x) in section 4. Theorem 5 may not be 
valid if / increases very steeply. However, it should be possible to derive an 
upper limit similar to (4.3) for functions / with a higher rate of growth than 
0{x). 

In [6], we proved that topological indices (with independent vertex proper- 
ties) are necessarily correlated if the vertex properties have expectations not 
equal to zero. Theorem 6 does not give this result as it is an assertion on co- 
variance only. The next step will therefore be an examination of correlations 
within this setting. 
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